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ABSTRACT 

The model for educational measurement developed by 
George Rasch, a Danish psychometrician, is reviewed and its 
application to occupational educational testing discussed. The Rasch 
model is an adaptation from the theory of latent trait analysis. 
According to it, answering an item correctly is. a function of the 
difficulty cf the item and the ability of the person being tested. 
The raw scores serve as the basis for estimating the scale of 
ability. The author concludes that the properties cf the Rasch 
analysis suggest solutions to a number of measurement problems in 
occupational engineering including developing and equating alternate 
forms of a test and estimating and interpreting changes in trainee 
performance. The item-free characteristics of this measurement model 
may allow the development of individually tailored tests. (DJ) 
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OBJECTIVE MEASUREMENT IN OCCUPATIONAL EDUCATION 
David L. Passmore* 

ere is an increasing tendency for employers and educators to make use of 
information accumulated by educational and psychological testing programs. In 
occupational education, there is a press for the development of means to measure 
and certify the competencies of teachers (Panitz & Olivo, 1970) as well as stu- 
dents (Baldwin, 1970; Ohio Trade and Industrial Education Services, 1970). Most 
test constructors in occupational education have chosen to be guided by classical 
measurement models &ven though these models may be inefficient in the light of 
recent advances in testing technology. The purpose of this paper is to review 
a promising model for educational measurement which was developed by George 
Rasch, a Danish psychometrician. This model has strong implications for future 
measurement practices in occupational education-;. 

Nature of the Problem 

Pervasive But Imprecise Nature of Measurement in Education 

Educational and psychological tests are prevasive phenomena on the American 
educational scene. An estimated 250 million tests are administered yearly in 
the nation's schools (Brim, Glass, Neulinger, Firestone, & Lerper, 1969). Voca- 
tional counselors and educators have certainly accounted for a large portion of 
this test usage. Unfortunately, educational measurement is directed by rela- 
tively inexact guidelines when compared to measurement practices in Science as 
a whole. One reason given for this inexactness is that educational measurement 

t 

practices lack the objective qualities often associated with the measurement of 
physical properties (Wright, 1968). 

Degrees of Objective Measurement 

Lacking objectivity . Suppose we are interested in estimating person A's 

It is often not feasible to ditectly observe this 



tool recognition ability. 



2 



Passmore 

ability. Instead, the magnitude of A's ability may be inferred from his responses 
to test items whies compose Form A of a hypothetical XYZ Tool Recognition Test. 

An indicant of his ability is the sum of his item scores when each response is 
* 'Scored O’ (wrong) or 1 (right). It is useful to transform his total raw Score 
into a value on a scale having a known distribution and uniform unit of measure- 
ment. However, commonly used transformations such as percentile ranks or stan- 
dard scores may lack objectivity. Specifically, such scales tend to be sample- 
bound since an estimate of A's ability may change when he is compared to norms 
based on different occupational groups. Also, A's scale may be item-bound since 
his ability' Estimate may change if he is administered Form B instead of Form A 
of the test. 

Possessing objectivity . Consider an example of the measurement of a physical 
property. The interpretation of the measurement of the length of an object such 
as a brick does not require limited reference to that specific set of objects 
used to calibrate our tape measure. Likewise, either a 12 inch or 36 inch tape 
measure may be chosen to estimate the length of the brick. The concern is that 
both tape measures have been calibrated in terms of some common scale to ensure 
comparable and generalizable results. 

A Model For Objective Measurement 
Criteria for Objectivity 

The interpretation of a person's test score should be sample-free and item- 
free in order to achieve objective educational measurement. Rasch (1960) stated 
conditions for objectivity more formally: 

(a) The scale used to convert raw test scores into useful estimates of ability 
must be generalizable to future test situations without requiring references to 
the characteristics of the sample used to calibrate this scale; 
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(b) Any number of different sets of test items selected from a pool of it em s 
which measure a unidimensional psychological attribute must yield similar esti- 
mates of a person's ability. 

•.v. 3 ch developed (1960) and explicated (1961, 1966a, 1966b) a mathematical 
model to meet these criteria. This model has been reviewed in depth by various 
measurement specialists (among others: Anderson, Kearney, & Everett, 1968; 

Birnbaum, 1968). 

2 

Rasch Model 

3 

Theory . The Rasch model is an adaptation from a probabalistic theory of 
test performance called latent trait analysis (cf. Lord, 1953). Individual 
differences in ability are assumed to exist on some uni dimensional variable of 
interest. Test items, hypothesized to relate to this variable, are devised and 
then administered to a group of examinees. It is also assumed that these items 
are scored dichotomously (right or wrong) and that speed does not influence 
responses to the test items. Items are. selected which, according to a statis- 
tical criterion, meet the assumptions of the model. Retained items are analyzed 
to develop a scale for transforming raw scores into ability estimates. 

According to the model, answering an item correctly (0^) is a function of ■> 
the difficulty (D^) of the item and the ability (A^) of the examinee or symbol- 
ically: °ni ” f( W* (D 

When this relationship is summed over all items, examinees' raw scores are used 
to estimate the left side of equation (1). Rasch proposed that the raw 

score was a sufficient statistic for estimating the "latent" parameters on the 
right side of equation (1). Rasch 's estimation procedure will not be elaborated 
here but it can be shown that the difficulty of each item and the ability estimate 
associated with each raw score do not change when they are derived using totally 
different groups of examinees. 
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-This is not true in classical test theory. Essentially, the model provides a 
means of converting raw scores into estimates of ability. In addition, the 
scale of ability estimates has been shown to have the properties of a ratio 
.* . . v _ , scale." .' The importance of these features in a practical testing situation is 

shown in the example below. 

Suppose that a hypothetical 60 item test of keypunching ability was ad- 
ministered to 100 keypunch trainees and that preliminary analyses revealed 
that 50 of these items might be scalable. Figure 1 is a raw score conversion 
table for this test. For example, if trainee X received a raw score of 25, 



Insert Figure 1 Here 

then his ability estimate would be 6.00 plus or minus, naturally, some amount 
of error. This is not markedly different from classical test score interpreta- 
tion. However, the unique character of the Rasch technique has been theoreti- 
cally and empirically supported. If responses from a new sample of trainees 
with somewhat different characteristics were analyzed, Rasch showed that the 
ability estimates derived for each raw score would be almost identical with 
those obtained in Figure 1. This means that the calibration of the ability 
scale would be freed from the quirks of the sample chosen for calibration pur- 
poses. Also, conversion tables similar to Figure 1 may be constructed for any 
number of subsets of the 50 items. Rasch showed that each examinee would receive 
a similar ability estimate on each of the tables even though these tables were 
based on different items. This means that ability estimation would be indepen- 
dent of the specific items chosen for measurement purposes. In addition, the 
ratio ability scale allows tremendous flexibility in the interpretation of 
results. 
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The mathematics encountered in using the Rasch model to infer ability from 
raw test scores are tedious. Fortunately, Wright and Panchapakesan (1969) have 
developed a computer program to determine the fit of the model to data and, 

.also.,'. 'to.' construct the raw score conversion table. 

Empirical tests . The robustness of the Rasch model with respect to viola- 
tions of kernel assumptions of the model has been supported by a number of compu- 
terized simulation studies (Brink, 1970; Noonan, 1969; Panchapakesan, 1969). 

Tests of the applicability of the model to demographic (Matthiessen, 1965), 
personality (Fowler & Bramble, 1972), and civil service (Durovic, 1970) data 
have produced encouraging results. The model has also been found adequate with 
achievement test data (Kearney, 1966; Tinsley, 1971). A study by Wright (1963) 
is outlined below to illustrate the type of research done on the Rasch model. 

Wright administered the Law School Admissions Test (LSAT) to a sample of 
1000 college students. The sample was divided at the median test score into 
two groups. Using a Rasch analysis on each group's item responses, two separ- 
ate scales were independently calibrated for the LSAT. Although mean test per- 
formance of both groups differed greatly, the two resulting conversion scales 
were identical. Therefore, the sample-free claims for the Rasch technique were 
supported since such dissimilar groups yielded identical scales. Next, Wright 
separated the 48 item LSAT at the median item difficulty value into two sets of 
24 items. The item-free character of the Rasch model was supported. Dissimilar 
sets of items produced similar ability estimates for all 1000 students. 

Implications for Occupational Education 
In the development of occupational competency exams on a national scale,' 
it may be necessary to develop and equate alternate forms of a test. Angoff 
(1963) listed the multitude of technical problems associated with this task in 
traditional test analysis. But, using the Rasch technique, alternate forms 
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could easily be constructed since any set of a pool of calibrated items would 
produce identical information about examinees. 

The ratio scale of measurement offered by the Rasch analysis may greatly 
facilitate the estimation and interpretation of change in trainee performance. 
Using ordinal or interval scales produced by classical test calibration, the 
amount of change exhibited by a trainee cannot be described simply as a ratio 

. * . * • -S 

of his scale scores (Harris, 1963). However, an appropriate indicant of the 
magnitude of the difference between two ratio scaled values is merely the ratio 
of these two values (Stevens, 1951). Therefore, using a Rasch ability scale, 
a meaningful interpretation can be made of the change in a trainee's performance 
from point to point in time in a training program. Also, the magnitude of the 
difference between two trainees' ability may be easily estimated. 

Baker, (1971, p.232) points out that prevalent test theory may be inade- 
quate in light of recent and anticipated advances in educational technology. 
Individualized instruction in vocational education may also demand individualized 
attention in evaluation. Rigidly defined groups of examinees, times of testing 
and tests themselves may be inappropriate in the future. It may be necessary 
to tailor the test" to the individual trainee who, for example, may have reading 
difficulties or other perceptual handicaps. The item-free character of the Rasch 
model would be very appealing in these circumstances. Using a pool of multi- 
modal items calibrated by the Rasch technique in conjunction with tailored 
testing routines outlined by Lord (1971), the jump to a dynamic, individualized, 
and, possibly, computerized testing situation is not extreme. 

Summary and Conclusions 

The Rasch - model represents a significant step in the refinement of objective 
educational measurement. The properties of the Rasch analysis suggest solutions 
to a number of measurement problems in occupational education including developing 
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and equating alternate forms of a test and, also, estimating and interpreting 
changes in trainee performance. The item-free characteristics of this measure- 
ment model may allow the development of individually tailored tests. 

Enthusiasm for the advantages offered by the Rasch model must be tempered, 
however. The amount of research completed on this model is minute when compared 
to the wealth of theoretical and empirical work done on classical test theory. 
The applicability of the Rasch model must be more closely examined by applying 
the model in a wide range of practical situations. 
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Footnotes 
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A bibliography of theoretical and empirical studies is available from the author 
upon request. 

^This intuitive treatment may beg substantiation. The interested reader may find 
Rasch's publications less mathematically anemic. 
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FIG. I A TYPICAL RASCH RAW SCORE 
CONVERSION TABLE. 



